Data Frame

ETL.R

Global Economic Data

“The Penn World table contains national income and purchasing power data for 189 countries and territories for available years between 1950 and 2009. The data has been converted into common currency and its updated on an annual basis. Funding for this research is provided by the National Science Foundation (NSF).”

Bar Charts

Using Tableau (Low Consumption Percentage)

Using Tableau (High Consumption Percentage)

  • Columns: SUM(Consumption Percentage of Real GDP)
  • Rows: Country
  • Filters: Year: 2009
  • Label: SUM(Consumption Percentage of Real GDP)

**We created two bar charts in Tableau to capture both ends of the consumption percentages (high and low)

Using R

source("../01 Data/barchart.R", echo = TRUE)
## 
## > require("jsonlite")
## Loading required package: jsonlite
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:utils':
## 
##     View
## 
## > require("RCurl")
## Loading required package: RCurl
## Loading required package: bitops
## 
## > require(ggplot2)
## Loading required package: ggplot2
## 
## > require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## > df <- data.frame(fromJSON(getURL(URLencode(gsub("\n", 
## +     " ", "skipper.cs.utexas.edu:5001/rest/native/?query=\n\"select country, consumption_per .... [TRUNCATED] 
## 
## > df <- dplyr::filter(df, YEAR == 2009)
## 
## > df$COUNTRY <- factor(df$COUNTRY, levels = df$COUNTRY[order(desc(df$CONSUMPTION_PERCENTAGE))])
## 
## > ggplot() + geom_bar() + coord_flip() + scale_x_discrete() + 
## +     scale_y_discrete() + labs(title = "Consumption Percentage per Country in 2009") + .... [TRUNCATED]

  • The average consumption percentage of GDP is 70% for 2009, the year after the stock market crash.
  • Equatorial Guinea had a consumption percentage of 9.5% of their GDP, while Liberia had a percentage of 192.4%.

Scatter Plots

Using Tableau (Low Population)

  • Columns: Population
  • Rows: Real GDP
  • Filters: Population: 7-266,859; Color: Year
  • Label: Country

  • The color ranged from red to green, 1950 to 2009 respectively. Using these colors, the visualizations depict the population and Real GDP growth rates over time.
  • Italy, France, and Germany’s population growth rates plateaued around the 1980s.
  • Japan’s Real GDP rate increased more significatly than any other country in this population range.
  • Pakistan and Indonisia’s population rate are continuing to grow, while their economies did not not expand as much as the other countries.

Using R

source("../01 Data/scatterplot.R", echo = TRUE)
## 
## > require("jsonlite")
## 
## > require("RCurl")
## 
## > require(ggplot2)
## 
## > require(dplyr)
## 
## > df <- data.frame(fromJSON(getURL(URLencode(gsub("\n", 
## +     " ", "skipper.cs.utexas.edu:5001/rest/native/?query=\n\"select POP, REAL_GDP, YEAR, COU .... [TRUNCATED] 
## 
## > df <- dplyr::filter(df, POP < 266859)
## 
## > ggplot() + geom_point() + labs(title = "Real GDP Versus Population") + 
## +     labs(x = paste("Pop"), y = paste("Real GDP")) + layer(data = df, 
## +    .... [TRUNCATED]

Something extra! (High Population)

**We found some interesting differences in GDP growth between countries with low populations and countries with high population, and decided to include it here.

  • Columns: Population
  • Rows: Real GDP
  • Filters: Population: 320,229-1,323,592; Color: Year
  • Label: Country

  • These countries were isolated from the other plot due to higher populations.
  • India and China’s Real GDP and population grew immensely in the 21st century.
  • China’s Real GDP has doubled in the past decade.

Cross Tabs

Using Tableau

  • Columns: Year
  • Rows: Country
  • Filters: Year: 2001-2009, Color: AGG(KPI)
  • Label: Real Gross Domestic Income

  • Real Gross Domestic Income of countries from 2001-2009.
  • Low Key Performance Indicator (Real Gross Domestic Income) <= $1,930.22
  • Medium Key Performance Indicator (Real Gross Domestic Income) <= $7108.52
  • High Key Performance Indicator (Real Gross Domestic Income) > $7108.52
  • Azerbaijan’s real gross domestic income tripled in under a decade.
  • Belarus’ real gross domestic income doubled in under a decade.

Using R

source("../01 Data/cross_tab.R", echo = TRUE)
## 
## > require("jsonlite")
## 
## > require("RCurl")
## 
## > require(ggplot2)
## 
## > require(dplyr)
## 
## > KPI_Low_Max_value = 1930.22
## 
## > KPI_Medium_Max_value = 4108.52
## 
## > df <- data.frame(fromJSON(getURL(URLencode(gsub("\n", 
## +     " ", "skipper.cs.utexas.edu:5001/rest/native/?query=\n\"select country, real_gross_dome .... [TRUNCATED] 
## 
## > df <- dplyr::filter(df, YEAR > 2000)
## 
## > df <- dplyr::filter(df, YEAR < 2010)
## 
## > df <- df %>% mutate(ratio = REAL_GROSS_DOMESTIC_INCOME) %>% 
## +     mutate(KPI = ifelse(ratio <= KPI_Low_Max_value, "03 Low", 
## +         ifelse(ratio .... [TRUNCATED] 
## 
## > ggplot() + coord_cartesian() + scale_y_discrete() + 
## +     labs(title = "KPI of Countries from 2001-2009") + labs(x = paste("Year"), 
## +     y = past .... [TRUNCATED]

Calculated Fields and Parameters

  • Examples of the setting and adjustments used to create the visualizations.

Blending Data Sources

Using Tableau

Using R

source("../01 Data/blended_data.R", echo = TRUE)
## 
## > require("jsonlite")
## 
## > require("RCurl")
## 
## > require(dplyr)
## 
## > global_economics_1 <- data.frame(fromJSON(getURL(URLencode(gsub("\n", 
## +     " ", "skipper.cs.utexas.edu:5001/rest/native/?query=\n\"select * from g .... [TRUNCATED] 
## 
## > global_economics_2 <- data.frame(fromJSON(getURL(URLencode(gsub("\n", 
## +     " ", "skipper.cs.utexas.edu:5001/rest/native/?query=\n\"select * from g .... [TRUNCATED] 
## 
## > dplyr::inner_join(global_economics_1, global_economics_2, 
## +     by = "COUNTRY") %>% View
## Warning in inner_join_impl(x, y, by$x, by$y): joining factors with
## different levels, coercing to character vector
## 
## > blended <- data.frame(fromJSON(getURL(URLencode(gsub("\n", 
## +     " ", "skipper.cs.utexas.edu:5001/rest/native/?query=\n\"select * \nfrom global_eco .... [TRUNCATED]

*The new columns start with “Country_Isocode”

Citation: https://app.enigma.io/table/edu.upenn.econ.pwt?row=0&col=30&page=1